Introduction to Geospatial Graph Networks with OSMNX¶

👋

  • Head of Data Science @ WhiteHat Security
  • Trustee/Tresurer @ Farset Labs

Agenda¶

  • What is a Graph?
  • What is Open Street Map
  • What is OSMNX?

What is a Graph)?¶

a structure amounting to a set of objects [vertices] in which some pairs of the objects are in some sense "related" [joined by edges]

Basically; $G=(V,E)$

For the above graph; $V = \{1,2,3,4,5,6\}$, and $E = \{\{1, 2\}, \{1, 5\}, \{2, 3\}, \{2, 5\}, \{3, 4\}, \{4, 5\}, \{4, 6\}\}$

What can you do with Graphs?¶

Represent pretty much anything you like;

  • Package Dependencies
  • Function Calls
  • Social Networks
  • Biological Systems (protein interactions)

Assuming that your Vertices can store attributes

How are graphs represented?¶

Adjacency Matrix! Row/Column corresponding to the 'index of the node'

$$\begin{bmatrix} 0.00 & 0.00 & 1.00 & 1.00 & 0.00 & 0.00 & 1.00\\ 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00\\ 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00\\ 0.00 & 1.00 & 0.00 & 0.00 & 0.00 & 1.00 & 0.00\\ 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00\\ 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00\\ 1.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 & 0.00 \end{bmatrix} $$

What can you do with Graphs in Python¶

Shockingly little, on your own...

In [3]:
@dataclass
class Node:
    id: str

@dataclass
class Edge:
    a: Node
    b: Node

@dataclass
class Graph:
    vertices: List[Node]
    edges: List[Edge]
In [4]:
nodes = [Node(id) for id in range(1,7)]

edge_indexes = [(1,2),(1,5),(2,3),(2,5),(3,4),(4,5),(4,6)]

edges = [Edge(nodes[a-1],nodes[b-1]) for a,b in edge_indexes]

Graph(nodes,edges)
Out[4]:
Graph(vertices=[Node(id=1), Node(id=2), Node(id=3), Node(id=4), Node(id=5), Node(id=6)], edges=[Edge(a=Node(id=1), b=Node(id=2)), Edge(a=Node(id=1), b=Node(id=5)), Edge(a=Node(id=2), b=Node(id=3)), Edge(a=Node(id=2), b=Node(id=5)), Edge(a=Node(id=3), b=Node(id=4)), Edge(a=Node(id=4), b=Node(id=5)), Edge(a=Node(id=4), b=Node(id=6))])

There must be a better way¶

[NetworkX](https://networkx.org/documentation/stable/)

In [5]:
import networkx as nx
G = nx.Graph()
G.add_edges_from([(1,2),(1,5),(2,3),(2,5),(3,4),(4,5),(4,6)])
G
Out[5]:
<networkx.classes.graph.Graph at 0x112f90b20>
In [6]:
nx.draw_networkx(G)

Factoid; Graph nodes don't have inherant 'positions'¶

NetworkX does it's best to 'jiggle the nodes around' to fit in a particular space, so the final output is non-deterministic in simple graphs

In [7]:
nx.draw_networkx(G)

But you can give it hints/instructions¶

In [8]:
nx.draw_networkx(G, nx.circular_layout(G))

Graphs can be directed or undirected¶

Previous example was undirected

In [9]:
G = nx.DiGraph()
G.add_edges_from([(1,2),(1,5),(2,3),(2,5),(3,4),(4,5),(4,6)])
nx.draw_networkx(G)

Graphs can loop!¶

In [10]:
G = nx.DiGraph()
G.add_edges_from([(1,2),(1,5),(2,3),(2,5),(3,4),(4,5),(4,6)])
G.add_edge(1,1)
nx.draw_networkx(G, nx.kamada_kawai_layout(G)) # this just looks better

Edges can have funky features¶

It just takes a little bit more fiddling

In [11]:
G = nx.Graph()
G.add_edge(1, 2, color='r' ,weight=3)
G.add_edge(2, 3, color='b', weight=5)
G.add_edge(2, 4, color='y', weight=7)
G.add_edge(3, 4, color='g', weight=12)

nx.draw_networkx(G, nx.circular_layout(G), 
        edge_color=list(nx.get_edge_attributes(G,'color').values()), 
        width=list(nx.get_edge_attributes(G,'weight').values()))

Nodes don't have to be numbers/strings!¶

Any hashable object will do

(ditto edges)

In [12]:
from enum import Enum
from typing import Optional

@dataclass(frozen=True)
class Person:
    name:str
    age:int
    def __repr__(self):
        return f'{self.name}({self.age})'
        
        
class Relation(Enum):
    FRIEND = 1
    FOE = 2
In [13]:
people = [Person('Joe',79),Person('Vlad', 69),Person('Wlad', 44)]
G = nx.Graph()
G.add_nodes_from(people)
G.add_edge(people[0],people[1], object=Relation.FOE)
G.add_edge(people[0],people[2], object=Relation.FRIEND)
G.add_edge(people[1],people[2], object=Relation.FOE)

G.nodes
Out[13]:
NodeView((Joe(79), Vlad(69), Wlad(44)))
In [14]:
for relationship in G.edges:
    print(relationship, str(G.get_edge_data(*relationship)['object']))
(Joe(79), Vlad(69)) Relation.FOE
(Joe(79), Wlad(44)) Relation.FRIEND
(Vlad(69), Wlad(44)) Relation.FOE

Ok, but why?¶

Have you heard of PageRank? Back-links between 7 example websites; which is most important?

In [15]:
import networkx as nx
import pandas as pd

G = nx.DiGraph()

[G.add_node(k) for k in ["A", "B", "C", "D", "E", "F", "G"]]
G.add_edges_from([('G','A'), ('A','G'),('B','A'),('C','A'),('A','C'),('A','D'),
                  ('E','A'),('F','A'),('D','B'),('D','F')])
nx.draw_networkx(G, 
                 nx.spring_layout(G),
                 with_labels = True)

Probably A, but by how much?¶

In [16]:
pr = nx.pagerank(G)
pd.Series(pr).plot.bar(title='PageRank')
Out[16]:
<AxesSubplot:title={'center':'PageRank'}>

Using PR to hint for placement/display¶

In [17]:
nx.draw_networkx(G, nx.spring_layout(G, weight=pr.values()),
                 with_labels = True,
                 node_color=list(pr.values()))

Neighbours, Commuities, Clusters and Cliques¶

Zachary's karate club is a social network of a university karate club [...] The network captures 34 members of a karate club, documenting links between pairs of members who interacted outside the club.

Node 0 is the Instructor, and Node 33 is the club president (yey zero indexing)

In [18]:
G=nx.karate_club_graph()
nx.draw_networkx(G,nx.kamada_kawai_layout(G), with_labels=True)

Centrality¶

For every pair of vertices in a connected graph, there exists at least one shortest path between the vertices such that either the number of edges that the path passes through (for unweighted graphs) or the sum of the weights of the edges (for weighted graphs) is minimized. The betweenness centrality for each vertex is the number of these shortest paths that pass through the vertex...

...Or, my version

How much of a pain would it be if this edge/node was removed?

Centrality Example¶

Nodes at the 'edge' are less 'central'

Node Centrality....¶

In [19]:
nx.draw_networkx(G, 
        with_labels = True,
        node_color=list(nx.betweenness_centrality(G).values()),
        cmap=plt.cm.cool)#blue->purple

Edge Centrality...¶

In [20]:
nx.draw_networkx(G, 
        with_labels = True,
        edge_color=list(nx.edge_betweenness_centrality(G).values()),
        edge_cmap=plt.cm.hot_r)#Yellow->Red->Black

But what can this tell us?¶

Cliques!

Girvan-Newman algorithm finds the 'most valuable edge' (most central), removes it, and repeats until it breaks the graph into suitable 'communities'.

But what can this tell us?¶

Cliques!

Girvan-Newman algorithm finds the 'most valuable edge' (most central), removes it, and repeats until it breaks the graph into suitable 'communities'.

In [21]:
from networkx.algorithms.community.centrality import girvan_newman

G = nx.karate_club_graph()
communities = girvan_newman(G)
for iteration in communities:
    if len(iteration) > 4:
        break
    else:
        print(f'{len(iteration)} communities, of sizes {[len(c) for c in iteration]}')
2 communities, of sizes [15, 19]
3 communities, of sizes [15, 18, 1]
4 communities, of sizes [10, 18, 5, 1]

From this we can see that after cutting into two communities, you just end up with tiny 'cliques' beyond that

What does it look like then clever clogs?¶

Looks to me like the coach (0) and the president (33) don't get along...

In [22]:
communities = next(girvan_newman(G))#just the first one
nx.draw_networkx(G, 
        with_labels = True,
        node_color = [['g','r'][node in communities[0]] for node in G])

Graph Summary¶

Everything is a graph!

What is Open Street Map¶

A Free, Editable Map of the whole world, built by over 8 Million volunteers and released with an Open Content License

Consists of a network of:

  • Nodes
  • Ways
  • Relations

Nodes¶

It consists of a single point in space defined by its latitude, longitude and node id.

Ways¶

A way normally represents a linear feature on the ground (such as a road, wall, or river).

Relations¶

It is used to define logical or geographic relationships between these different objects (for example a lake and its island, or several roads for a bus route).

Remind you of anything?¶

It's just a graph!

Enter OSMNX¶

Combination of

  • GeoPandas
  • OpenStreetMap API
  • NetworkX

Lets start off with the simple stuff first¶

Plotting Geo Boundaries;

In [23]:
import osmnx as ox
area = ox.geocode_to_gdf('County Antrim, Northern Ireland')
ax = ox.project_gdf(area).plot() # gdf->'GeoDataFrame'

So what's under the hood here?¶

In [24]:
ox.project_gdf(area).T
Out[24]:
0
geometry (POLYGON ((648360.629882516 6119342.555832483,...
bbox_north 55.313086
bbox_south 54.477277
bbox_east -5.687963
bbox_west -6.668989
place_id 284858948
osm_type relation
osm_id 1119534
lat 54.864726
lon -6.143638
display_name County Antrim, Northern Ireland, United Kingdom
class place
type county
importance 0.944822
In [25]:
ox.project_gdf(area).T[0].geometry.area # m^2
Out[25]:
3095315737.903886

United Blobs¶

Since these are administrative boundaries they don't really care about land all the time...

In [26]:
countries = ox.geocode_to_gdf(['Northern Ireland','Ireland','England','Scotland','Wales'])
ax = ox.project_gdf(countries).plot()

Take it to the streets¶

Streets are just Nodes and Ways!

osmnx allows you to query for a graph of these points instead of a GeoDataFrame, and it's just a networkx Graph!

In [27]:
G = ox.graph_from_point((54.5973, -5.9301), dist=1000, dist_type='bbox', network_type="all") # Belfast
G,len(G.nodes), len(G.edges)
Out[27]:
(<networkx.classes.multidigraph.MultiDiGraph at 0x1b7c88190>, 2318, 5643)
In [28]:
fig,ax = ox.plot_graph(ox.project_graph(G))

What's that network_type field?¶

  • ‘drive’ – get drivable public streets (but not service roads)
  • ‘drive_service’ – get drivable public streets, including service roads
  • ‘walk’ – get all streets and paths that pedestrians can use (this network type ignores one-way directionality)
  • ‘bike’ – get all streets and paths that cyclists can use
  • ‘all’ – download all (non-private) OSM streets and paths
  • ‘all_private’ – download all OSM streets and paths, including private-access ones

🚗¶

Get only the drivable "ways"

In [29]:
G=ox.graph_from_point(points['belfast'], dist=1000, dist_type='bbox', network_type="drive")
ox.plot_graph(ox.project_graph(G))
Out[29]:
(<Figure size 576x576 with 1 Axes>, <AxesSubplot:>)

What else is in there?¶

(Notice we're back to all again!)

In [31]:
G=ox.graph_from_point(points['belfast'], dist=1000, dist_type='bbox', network_type="all")
df = edge_df(G) # this is hidden; checkout the notebook!
df.head()
Out[31]:
u v key osmid ref name highway maxspeed oneway length geometry lanes access bridge width service tunnel junction area
0 351599 351600 0 5209408 B503 Sandy Row secondary 30 mph False 68.019 LINESTRING (-5.9369028 54.5913308, -5.9366651 ... NaN NaN NaN NaN NaN NaN NaN NaN
1 351599 1121792828 0 5209408 B503 Sandy Row secondary 30 mph False 58.732 LINESTRING (-5.9369028 54.5913308, -5.9370089 ... NaN NaN NaN NaN NaN NaN NaN NaN
2 351599 455032602 0 38485621 NaN NaN residential NaN False 89.353 LINESTRING (-5.9369028 54.5913308, -5.937854 5... NaN NaN NaN NaN NaN NaN NaN NaN
3 351600 231772694 0 5209408 B503 Sandy Row secondary 30 mph False 16.131 NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 351600 351599 0 5209408 B503 Sandy Row secondary 30 mph False 68.019 LINESTRING (-5.9366385 54.591923, -5.9366651 5... NaN NaN NaN NaN NaN NaN NaN NaN
In [32]:
df.highway.value_counts()
Out[32]:
service                                 1468
residential                              934
footway                                  862
unclassified                             553
primary                                  399
tertiary                                 248
path                                     236
pedestrian                               206
secondary                                205
cycleway                                 126
living_street                             72
[residential, footway]                    64
trunk_link                                33
[steps, footway]                          24
steps                                     20
[service, unclassified]                   18
[footway, service]                        18
secondary_link                            18
tertiary_link                             11
[residential, path]                       10
motorway_link                              9
[residential, service]                     8
trunk                                      8
[footway, cycleway]                        8
[service, pedestrian]                      6
corridor                                   6
[path, living_street]                      6
[residential, pedestrian]                  6
[footway, living_street]                   6
primary_link                               6
[footway, unclassified]                    4
[path, cycleway]                           4
[footway, pedestrian]                      4
[footway, corridor]                        4
[corridor, footway]                        4
[footway, steps, corridor]                 2
[path, footway]                            2
[steps, pedestrian, living_street]         2
[steps, path]                              2
[path, unclassified]                       2
[corridor, pedestrian]                     2
[corridor, steps, footway]                 2
[residential, living_street]               2
[steps, path, pedestrian]                  2
[residential, unclassified]                2
[residential, footway, unclassified]       2
[steps, path, footway]                     1
[steps, footway, path]                     1
[pedestrian, cycleway]                     1
[cycleway, pedestrian]                     1
[cycleway, unclassified]                   1
[unclassified, cycleway]                   1
motorway                                   1
Name: highway, dtype: int64

Lets make a mess of this dataframe!¶

Un-listify these highway types... and other mess that you'll have to check out the notebook to review!

In [33]:
highway_types = Counter()
for hw in df.highway.values:
    if isinstance(hw, list):
        highway_types.update(hw)
    else:
        highway_types[hw]+=1
        
for hw in highway_types:
    df[f'is_{hw}']=df.highway.apply(partial(un_listify, sentinel=hw))
    
highway_types.most_common(5)
Out[33]:
[('service', 1518),
 ('residential', 1028),
 ('footway', 1008),
 ('unclassified', 583),
 ('primary', 399)]

Paint me a picture¶

So... where are the actual dedicated cycleways in Belfast?

In [34]:
fig, ax=ox.plot_graph(ox.project_graph(G),
             edge_color=df.is_cycleway.apply(lambda hw: ['w','g'][hw]),
             edge_linewidth=df.is_cycleway.apply(lambda hw: [0.5,2][hw]),
             node_size=0
)

Back to graph stuff¶

Where are the most connected (i.e. central, i.e. 'critical') junctions in/around belfast?)

(Just using the drive network now...), and this time with a much bigger map! (Be careful if you're doing this at home; it's a chonky boi)

In [35]:
G=ox.graph_from_point(points['belfast'], dist=3_000, dist_type='bbox', network_type="drive", simplify=True)
fig, ax=ox.plot_graph(ox.project_graph(G))
In [36]:
df_n = node_df(G)
df_e = edge_df(G)
for hw in highway_types:
    df_e[f'is_{hw}']=df_e.highway.apply(partial(un_listify, sentinel=hw))
In [37]:
df_e.highway.value_counts()
Out[37]:
residential                     8157
tertiary                        1217
primary                         1138
secondary                        740
unclassified                     656
living_street                    388
trunk                             56
trunk_link                        52
secondary_link                    38
[residential, living_street]      30
motorway_link                     25
tertiary_link                     23
primary_link                      17
motorway                          12
[residential, unclassified]        8
[unclassified, primary]            1
Name: highway, dtype: int64
In [38]:
centrality = nx.betweenness_centrality(nx.DiGraph(G),weight='length') # This is the expensive boi!
df_n['centrality'] = df_n['key'].apply(centrality.get)
In [39]:
norm = colors.Normalize(vmin=df_n['centrality'].min(), vmax=df_n['centrality'].max())

centrality_size = df_n['centrality'].apply(norm)*128
centrality_colors = df_n['centrality'].apply(norm).apply(cmap).values
In [40]:
fig, ax=ox.plot_graph(G, node_size=centrality_size, node_color=centrality_colors)

The big Cahuna¶

  • Road Widths actually represented by number of lanes
  • Edge Colouring by Highway type
  • Node Coloring and Size on Centrality

(See the notebook for the dirt...)

In [42]:
fig, ax = ox.plot_graph(G, 
                        edge_color=highway_color, 
                        edge_linewidth=display_width,
                        node_size=centrality_size, node_color=centrality_colors
)

Walking the graph¶

Get real world 'shortest path' distance from one point to another

In [43]:
center_node = ox.get_nearest_node(G, at['home'])
path_lengths = nx.single_source_dijkstra_path_length(G, center_node, weight='length')
df_n['distance'] = df_n['key'].apply(path_lengths.get)
norm = colors.Normalize(vmin=df_n['distance'].min(), vmax=df_n['distance'].max())
distance_colours = df_n['distance'].apply(norm).apply(plt.cm.get_cmap('jet_r')).values
In [44]:
fig,ax = ox.plot_graph(G, node_color=distance_colours, 
                       node_size = [[10,500][node==center_node] for node in G.nodes])

Create a walking distance map with isochrones¶

Subdivide graphs into regions of walkability

In [45]:
# add an edge attribute for time in minutes required to traverse each edge
trip_times = [ 10, 15, 30, 45] #in minutes
travel_speed = 5 #walking speed in km/hour
meters_per_minute = travel_speed * 1000 / 60 #km per hour to m per minute
for u, v, k, data in G.edges(data=True, keys=True):
    data['time'] = data['length'] / meters_per_minute
In [46]:
# make the isochrone polygons
isochrone_polys = []
for trip_time in sorted(trip_times, reverse=True):
    subgraph = nx.ego_graph(G, center_node, radius=trip_time, distance='time')
    node_points = [Point((data['x'], data['y'])) for node, data in subgraph.nodes(data=True)]
    bounding_poly = gpd.GeoSeries(node_points).unary_union.convex_hull
    isochrone_polys.append(bounding_poly)
In [47]:
# plot the network then add isochrones as colored descartes polygon patches
from descartes import PolygonPatch
fig, ax = ox.plot_graph(G, show=False, close=False, edge_color='k', edge_alpha=0.2)


norm = colors.LogNorm(vmin=min(trip_times), vmax=max(trip_times))
cmap=plt.cm.get_cmap('cool', len(trip_times))

handles = []

labels=[]
for polygon, trip_time in zip(isochrone_polys, reversed(trip_times)):
    patch = PolygonPatch(polygon, fc=cmap(norm(trip_time)), ec='none', alpha=0.6, zorder=-1, label="{}min walk from Node".format(trip_time))
    ax.add_patch(patch)
    handles.append(patch)
_=plt.legend(handles=handles)
In [48]:
fig.show()

More than just nodes; Bearings¶

In [49]:
n = len(points)
ncols = int(np.ceil(np.sqrt(n)))
nrows = int(np.ceil(n / ncols))
fig, axes = plt.subplots(nrows, ncols, figsize=figsize, subplot_kw={"projection": "polar"})

for ax,(place,coords) in zip(axes.flat,points.items()):
    G = ox.graph_from_point(coords,dist=1000, dist_type='bbox', network_type='drive')
    Gu = ox.add_edge_bearings(ox.get_undirected(G))
    fig,ax = ox.plot_orientation(Gu, title=place, area=True, ax=ax)

What can you do with this?¶

Designed Hoodies for WhiteHat Security

In [50]:
Image('vantage_hoodie_bolster.png')
Out[50]:

Laser Cutting¶

Using Farset Labs awesome GlowForge laser cutter!

(Also thanks to AlertLogic for "donating" the plastic... kinda...)

In [51]:
youtube_video('xD0mwsZcI0A').display()

Sold Pillows on Redbubble!¶

In [52]:
Image('redbubble.png')
Out[52]:

Recommended Links¶

  • Networkx.guide
In [ ]: